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Takens' Embedding Theorem remarkably established that concatenating M previous outputs of a 
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dynamical system into a vector (called a delay coordinate map) can be a one-to-one mapping of a low- 
dimensional attractor from the system state space. However, Takens' theorem is fragile in the sense that 
even small imperfections can induce arbitrarily large errors in this attractor representation. We extend 
Takens' result to establish deterministic, explicit and non-asymptotic sufficient conditions for a delay 
coordinate map to form a stable embedding in the restricted case of linear dynamical systems and 
observation functions. Our work is inspired by the field of Compressive Sensing (CS), where results 
guarantee that low-dimensional signal families can be robustly reconstructed if they are stably embedded 
by a measurement operator. However, in contrast to typical CS results, i) our sufficient conditions are 
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0^ ' independent of the size of the ambient state space, and ii) some system and measurement pairs have 



fundamental limits on the conditioning of the embedding (i.e., how close it is to an isometry), meaning 
that further measurements beyond some point add no further significant value. We use several simple 
simulations to explore the conditions of the main results, including the tightness of the bounds and the 
convergence speed of the stable embedding. We also present an example task of estimating the attractor 
dimension from time-series data to highlight the value of stable embeddings over traditional Takens' 
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embeddings. 
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I. Introduction 

Of the many types of data confronting signal processing researchers, time series data is perhaps one 
of the most common. While there are many possible ways to analyze a time series, one of the most 
important tasks in many areas of science and engineering is to characterize (or predict) the state of 
a dynamical system from a stream of its output data [2,3]. This type of state identification can be 
particularly challenging because the internal (possibly high-dimensional) system state x(t) G R N is 
often only indirectly observed via a one-dimensional time series of measurements produced through an 
observation function s(t) = h(x(t)), where h : R N — > R. 

Surprisingly, when the dynamical system has low-dimensional structure because the state is confined 
to an attractor M of dimension d (d < N) in the state space, Takens' Embedding Theorem [4,5] shows 
that complete information about the hidden state of this system can be preserved in the time series output 
data s(t). Indeed, many systems of interest do have this type of structure [6], and a variety of algorithms 
for tasks such as time series prediction and attractor dimension estimation exploit Takens' result [3]. 
Specifically, Takens defined the delay coordinate map F : R N — > R M as a mapping of the state vector 
x(t) to a point in the reconstruction space (R M ) by taking M uniformly spaced samples of the past time 
series (with sampling interval T s ) and concatenating them into a single vector, 

F(x(t)) = [s(t) s(t - T a ) s(t - 2T S ) ... S (t-(M- l)T s )f . (1) 

Takens' main result [4] (later refined in [5]) states that (under a few conditions on T s discussed later) for 
almost every smooth observation function h(-), the delay coordinate map is an embedding 1 of the state 
space attractor M. when M > 2d. In other words, despite the state being hidden from direct observation, 
the topology of the attractor that characterizes the dynamical system can be preserved in the time series 
data when it is arranged into a delay coordinate map. 

In the absence of imperfections such as measurement or system noise, Takens' result indicates that a 
delay coordinate map should be as useful for characterizing a system as direct observation of the hidden 
system state. However, in the presence of noise, a one-to-one mapping may not be sufficient to guarantee 
the robustness of any processing performed in the reconstruction space (e.g., dimensionality estimation). 
The main underlying problem is that while Takens' theorem guarantees the preservation of the attractor's 
topology, it does not guarantee that the geometry of the attractor is also preserved. For example, Takens' 

1 An embedding is a one-to-one immersion. 
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result guarantees that two points on the attractor M do not map to the same point in the reconstruction 
space, but there are no guarantees that close points on the attractor remain close under this mapping 
(or far away points remain far away). Consequently, relatively small imperfections could have arbitrarily 
large effects when the delay coordinate map is used in applications. 

In the signal processing community, recent work has highlighted the importance of well-conditioned 
measurement operators to ensure the geometry of a low-dimensional signal family is preserved. Consider 
a signal class M with intrinsic dimension d residing in 1^ and measurement operator F : R N — > R M . 
We call F to be a stable embedding of the signal class M if for all distinct pairs of points x,y G M 
their pairwise distances are preserved by satisfying 

C(l-5)< ^-^ <C(l + 6). (2) 

The scaling constant C could be absorbed into F and the conditioning number < 5 < 1 bounds how 
much pairwise distances between signals in M can change when mapped by F (i.e., how near F is to 
an isometry). The Johnson-Lindenstrauss (JL) lemma [7, 8] gives an example of a stable embedding of 
a signal class M consisting of a point cloud of d = \M\ distinct points in R N . In this result, a random 
measurement matrix F with M = 0(\og(d)) rows ensures that (2) holds with high probability for all pairs 
of points in the point cloud M. Another example is the recent work in the field of compressed sensing 
(CS) [9, 10], where the canonical results show that similar random matrices F satisfy the Restricted 
Isometry Property (RIP) with high probability when M = 0(d\og(N/d)) [11,12]. The RIP guarantees 
that (2) holds for all pairs of ti-sparse signals (i.e., the signal family A4 is comprised of signals on the 
union of all ti-dimensional subspaces within R ). Beyond extending the concept of the JL lemma from 
a finite point cloud to an infinite signal family, the CS results show the value of stable measurement 
operators by also making guarantees about efficient and robust signal recovery from these measurements. 
The notion of a stable embedding has also been extended to other signal models [13], including manifold 
signal families [14, 15]. The latter can be seen as an extension of Whitney's Embedding Theorem [16]; 
while Whitney's Embedding Theorem ensures a one-to-one mapping of a manifold M. with dimension 
d for almost any smooth projection function F given that M > 2d, the results in [14] further guarantee 
that (2) holds over this signal family for a given S with high probability when M = 0(dlog(N)) and F 
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is a random orthoprojector. 2 

While the notion of embedding the state of a dynamical system may seem far removed from the 
CS results, there is actually a close connection. It is well-known that Takens' Embedding Theorem can 
be viewed as a special case of Whitney's Embedding Theorem where the measurement operator F is 
restricted to forming a delay coordinate map (i.e., F = F) and Ai is taken to be the state space attractor 
(i.e., M. = M) [3]. The main contribution of this paper is to further these connections by establishing 
sufficient conditions whereby the delay coordinate map is a stable embedding of the state space attractor 
for linear systems with linear observations functions. Indeed, the main technical result of this paper 
establishes deterministic, explicit and non-asymptotic sufficient conditions for the delay coordinate map 
to be a stable embedding with a given conditioning 5. We also explore the meaning of these conditions 
for characterizing systems via delay coordinate maps. In particular, the results of this exploration are 
interesting because they contrast with the standard CS results in two principle ways: (i) the conditioning 
of the operator cannot always be improved by taking more measurements, as some system/observation 
pairs will have a fundamental limit in how well the system geometry can be preserved, and (ii) the 
necessary number of measurements scales with the dimension of the attractor d but is independent of the 
dimension of the ambient space N. 

Due to the importance of nonlinear systems, a similar general stable embedding result for nonlinear 
dynamical systems is obviously of great interest. Linear systems have a wealth of tools available for their 
analysis and the language of "attractors" is uncommon when studying these relatively simple systems 
(despite the notion of an attractor being technically well-posed for the restricted class of linear systems 
we study here). Therefore, beyond just contributing a new tool for linear systems analysis and design 
(as demonstrated in the example of Section IV-C), our present results are perhaps most valuable for 
elucidating some of the unique issues that arise when trying to stabilize the embeddings of dynamical 
systems, helping to pave the way for extensions to nonlinear systems. 

II. Background and Related Work 

In this section we will briefly review some preliminaries, including a precise statement of Takens' 
theorem, attractors of linear systems, and related work in stable embeddings of attractors and manifolds. 

2 The required number of measurements M in [14] also depends on some properties of the manifold (e.g., the maximum 
curvature). Clarkson [15] later improved upon M to remove the dependence on the ambient dimension N and reduce the 
dependence on certain properties of the manifold. 
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A. Linear Systems and Delay Coordinate Maps 

Let a dynamical system be denned by the differential equation: 

x(t) = * (x(t)) , (3) 

where x(t) G 1^ is the system state at time t, and <I> : R N — > R N is a smooth function. As stated 
earlier, in this paper we will restrict our examination to embeddings of linear dynamical systems where 
^ G R NxN is a matrix. Before going on, our discussion of these systems will require us to establish a 
basic notation for complex vector spaces. For u=[u\ ■ ■ ■ un] T G C^, we denote the complex variable 
by j, the (element-wise) complex conjugate by u* and the Hermitian transpose by u H = (u*) T . 

Given the system matrix and the definition of a dynamical system (3), knowing the state at some 
fixed time to is equivalent to knowing the path that the system takes to and from that state (called the 
flow). Classic results in linear systems theory [17] show that the explicit solution for this path is given 
by a matrix multiplication: x(to + t) = e q>t x(to) = &t%(to)> where 3>t = e** is the flow matrix. Note that 
this solution is valid for positive or negative values of t, describing the flow both forward and backward 
from time to. 

Delay coordinate maps that embed points on the attractor of a dynamical system are intimately 
connected with the flow of the system approaching that point. In particular, forming a delay coordinate 
map of a specific point in the state space requires collecting samples of the system flow backward in time 
from that point at regular intervals T s . To enable mathematical descriptions of this sampling operation 
along the flow, we suppress the implicit dependence on the sampling time T s and define the compact 
notation for the flow matrix as <3? = <£_t s so that x(t — T s ) = Qx(t). The delay coordinate map F with 
M delays given in (1) for the case of linear dynamical systems and linear observation functions h G M. N 
can then be written asaMxJV matrix: 

F= (h\ $ T h\ ••• | (<S> M - r fh) T . (4) 

To ensure that the linear dynamical systems under consideration have non-trivial steady-state behavior 
(i.e., oscillations rather than convergence to a fixed point), we restrict our study to the class of systems 
A(d) described in the following definition. 

Definition 11.1. We say that a linear dynamical system in M. N defined by (3) is of Class A(d) for d < ^ 
if the system matrix * is real, full rank and has distinct eigenvalues. Moreover, \I> has only d strictly 
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imaginary conjugate pairs of eigenvalues and the rest of its eigenvalues have real components strictly 
less than 0. The strictly imaginary conjugate pairs of eigenvalues are called the A-eigenvalues and they 
can be expressed as {±j#i}f =1 where 9±, ■ ■ ■ ,9d > are d distinct numbers. The corresponding unit- 
norm A-eigenvectors are v±,v±*,--- ,Vd,Vd*. The corresponding eigenvalues of the flow matrix $ are 
called the A^-eigenvalues, and are given by {e ±J iTs }f =1 . 

Furthermore, we define A = diag (j9\, — j9\, . . . , j9d, — j9d) as the diagonal matrix composed of the 
^-eigenvalues and V = (v± | v\* \ ■ ■ ■ \ v d | Vd*) G C Nx2d as the concatenation of the ^-eigenvectors 
into a matrix with rank(V) = 2d. Since <I> is the matrix exponential of *I>, it is well-known that they 
share the same eigenvectors [18]. Therefore, if we denote D = D_t s = e~ ATs as the diagonal matrix 
comprised of the ^^-eigenvalues, then we have = VD. 

In order to have a meaningful notion of an embedding, the dynamical system must have its state 
trajectory confined to a low-dimensional attractor in the state space. Even if the system has transient 
characteristics from a given starting point, the embedding of a system is only considered in steady-state 
when these transients have disappeared. Considering the steady-state dynamics of the system, we make 
explicit the notion of an attractor through the following definition. 

Definition II.2. Let a linear dynamical system be of class A(d) and let xq = Vao G l w for some 
ao G C 2d be an arbitrary initial state of the system. 4 We define the attractor of this linear dynamical 
system to be M = {x G R N \ x = Ve At a , t G M}. 

It is easy to see that M lives in the span of V. Also, the attractor of the system clearly depends on the 
initial state of the system. Because the main results of this paper do not depend on the choice of initial 
state, we will simply refer to the fixed attractor as M and suppress the implicit dependence on the initial 
state. Additionally, one can check that this definition meets the fundamental notion of an attractor, i.e., 
that any point on the attractor M when projected backwards (or forward) in time by $ will remain on 
M. Specifically, for any x G M, we can write x = Va x , where a x = e A ' x ao f° r some t x G M. Then 
we see that for some D (the diagonal matrix comprised of the A$> -eigenvalues as defined earlier) and 
any k G Z, § k x = <Z> k Va x = VD k a x , meaning that x remains on the attractor even when it is projected 
forward or backward in time. Finally, while we will not show this in detail due to space constraints, one 

3 A number x is strictly imaginary if Re{x} = 0. This condition ensures that the system modes corresponding to these 
eigenvectors have persistent oscillation in the steady-state response. 

4 We only need to consider xo in the span of the columns of V because any orthogonal components vanish in steady-state. 
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(a) (b) 

Fig. 1. Examples of attractors of linear dynamical systems of class A(d) in M. N for N = 2 and d = 1 with 
sampling interval T s = 1. (a) A system attractor when = j and v = [1, j] T . This results in a circular attractor 
where the system progresses at an angular speed determined by 9. (b) A system attractor when — j and v = 
[0.8165 + 0.4082j, — 0.4082j] T . Here the system also progresses at the same angular speed, but the attractor is now 
an ellipse. 

can show that for each i the state x(t) is moving in an elliptical orbit on the span of Re {vi} and Im {vi} 
with angular speed proportional to 9iT s . 

For clarity and to build intuition, we give two brief examples where N = 2, d = 1 and T s = 1. 
For the first example, consider a dynamical system of class A(d) with ^4-eigenvalue 9 = f and A- 
eigenvector v = -^[1, j] T - Shown in Figure 1(a) is the resulting circular attractor of this system, 
along with the real and imaginary components of the ^-eigenvector and a pair of states separated in 
time by T s (which corresponds to a separation of 9T S in angle). For the second example, consider a 
dynamical system of class A(d) with the same parameters except that the ^-eigenvector is now defined 
as v = [0.8165 + 0.4082j, -0.4082j] T . Shown in Figure 1(b) is the resulting elliptical attractor and 
state time samples, illustrating that the angular speed is unchanged at 9T S . In both of these examples, the 
elongation of the ellipse is determined by the inner product between Re{v} and Im{u}, which governs 
how well the attractors fill the dimensions of the state space that it occupies. While this is intuitive to 
visualize in the present case of d = 1, for general d > 1 this elongation is determined by the ratio 
between the smallest and largest eigenvalues of V V, denoted A\ and A2, respectively. When A\ = A2, 
the system state revolves around a circle when projected onto each of the subspaces spanned by Re{v p } 
and Im{fp} for p = 1, • • , d, and the resulting attractor is a product of these circular orbits. However 
when A2 S> A\, the projection of the attractor onto some (or all) of these subspaces will be a highly 
elongated ellipse, therefore not equally filling the dimensions of the state space that it occupies. 
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B. Attractor Embeddings 

The following theorem is an extension of Takens' original result [4], and gives a lower bound on the 
number of measurements M sufficient to ensure that a delay coordinate map F defined as in (1) is a 
one-to-one mapping from the state space attractor to the measurement (reconstruction) space. 

Theorem II. 1 (Takens' Embedding Theorem [5]). Assume the dynamical system converges to an attractor 
Ai of dimension d and pick a sampling interval T s > 0. Let M > 2d and suppose Ai has a finite number 
of equilibria, no periodic orbits of \I/ of period T s or 2T S , and at most finitely many periodic orbits of 
period kT s for k = 3, • • • , M. Then for almost every smooth function h, the delay-coordinate map F is 
one-to-one on Ai. 

The notion of "almost every" used in the theorem above is technical (see [5] for details), but is consistent 
with the heuristic notion that out of all possible functions h, most will indeed work. 

In this paper we consider the question of when the one-to-one property described in Theorem II. 1 can 
be improved to become a stable embedding where F is (nearly) an isometry that preserves the geometry 
of Ai. Specifically, we introduce the following definition to formalize the notion of a stable embedding. 

Definition II.3. Suppose we have a dynamical system in M. N that converges to an attractor Ai and a 
linear map F : M N — > R M . We say that F is a stable embedding of Ai with conditioning 5 if for all 
x,y G Ai and for some scaling constant C, we have 

c(1 _ s)< mf-m\\l <c(1+s) 

Note that smaller values of 5 in the above definition imply a more stable embedding because it guarantees 
that the map is closer to an isometry. We also note that preservation of Euclidean distances also implies 
that the geodesic distances between points on the attractor are preserved [14]. Because Taken's result 
only tells us that the delay coordinate map F is a one-to-one mapping, it does not guarantee any specific 
value of the conditioning, meaning that S could be arbitrarily close to 1 and the embedding could be 
highly unstable. 

To see why Takens' Embedding can be insufficient, we present an illustrative example where the 
conditioning of the embedding can be made arbitrarily bad when M is the minimum number of delays 
necessary to satisfy the sufficient conditions of Theorem II. 1. Consider a linear system of class .4.(1) with 
N = 2, T s = 1, .4-eigenvalue 6 = 0.03 and ^-eigenvector v = -4= [1, j] T . This system has a circular 
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(a) (b) (c) 

Fig. 2. Examining the conditioning ofTakens' embeddings. (a) The large (blue) circle shows the attractor of the linear 
system. The (black) diamond and (red) circle markers show 2 different points x, y that we pick on the opposite ends 
of the attractor. The arrow depicts the measurement function h(e). (b) The graph shows Q(x, y) for the points x, y in 
Figure 2(a) over a range of values of e from 0.01 to 0.1. The number of measurements M is fixed at 3, the minimum 
required by Takens' theorem, (c) Here Q(x, y) is plotted forM ranging from 3 to 400 (with e fixed at 0.1), suggesting 
a near isometry for F as M increases. 

attractor as depicted in Figure 2(a). We set the observation function to be h = yjfeiVt) VI ~~ e ] T - 5 
Given a particular pair of points x, y on opposite ends of the circular attractor (shown in Figure 2(a)), we 
examine the ratio Q(x, y) = ^^~^f^ 2 , where F is the delay coordinate map given in (4). Note that if 
F is a perfect isometry then Q(x, y) = 1, and we must have Q(x, y) > for F to be one-to-one. Fixing 
the number of measurements at M = 3 (the minimum required by Takens' theorem), Figure 2(b) shows 
the behavior of Q(x, y) for this pair of points as a function of e. We see that while meeting the sufficient 
conditions of Takens' Theorem, lining Q(x, y) = 0. Stated another way, by adjusting the parameter e 
the conditioning of F can be made arbitrarily bad for this pair of points. To see that this is not simply 
a bad pairing of the measurement function to the system, note that for any admissible choice of h there 
would exist a pair of points that would behave the same way. 6 To explore this example further, Figure 
2(c) plots Q(x,y) with e = 0.1 and varying M from 3 to 400. We see that with increasing M, the 
ratio Q(x, y) increases, oscillates and converges to a value of C = 1. This provides evidence suggesting 
that as M increases, the conditioning of F improves because the distance between this pair of points is 
preserved with increasing fidelity. This effect is not predicted by Theorem II. 1 , but will be shown in our 
main results in Section III-B. 

5 As will be described in Theorem III. 2, the observation function is normalized so that we have scaling constant of C = 1 
regardless of M. 

6 One can imagine this by rotating the points x, y by an angle equivalent to the angle between the new measurement function 
and the given h. 
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C. Related Work 

Independently but at nearly the same time as Takens' original work, Aeyels [19] looked at the same 
problem from a control theory standpoint. He showed that the delay-coordinate map is related to the 
observability criteria and that given any system in N dimensions (not just one confined to an attractor), a 
generic choice of observation function h guarantees that the system is observable as long as M > 2N + 1. 
Similar to the idea of a stable embedding, the authors in [20] developed a robustness measure for the 
observability of dynamical systems. Stated in the language of delay coordinate maps and sampled systems, 
they defined a system as observable with precision (e, S) if for any two states x, y on a trajectory in the 
state space, \\F{x) — F(y)||2 < e implies \\x — j/ 1| 2 < S. In addition to Takens' original investigation of 
attractor embeddings [4], significant advances were made by Sauer et al. [5] to extend these results to 
include attractors of non-integer dimensions (i.e., strange attractors) and to make the definition of "almost 
every" more in line with notions of an event that occurs with probability one. Our preliminary results 
showing conditions for a stable embedding for linear systems of class .4(1) were reported in [1]. 

There has also been significant prior work related to embedding manifolds (or fractal sets), which 
has important implications for attractor embeddings. Specifically, embedding results for manifolds were 
derived by Whitney [16] and later expanded on by Sauer et al. [5]. These results show that if a manifold 
has dimension d, then almost every smooth function mapping into R M with M > 2d will be an embedding 
of the manifold. Baraniuk & Wakin [14] extended these results to show that for manifolds with dimension 
d embedded in R N , random orthoprojections into R M provide a stable embedding of the manifold as 
long as M scales linearly with d and logarithmically with N (depending also on various properties of 
the manifold, such as the maximum curvature). Clarkson [15] later improved on the required number of 
measurements M by removing the dependence on N and certain worst case properties of the manifold. We 
note that these stable embedding results have been used to show that manifold learning and dimensionality 
estimation algorithms can be performed in the compressed space with nearly the same accuracy as they 
could be performed in the original space [21]. The main distinction between these manifold embedding 
results and Takens' theorem is that these results acquire M independent observations of each single point 
on the manifold, whereas Takens' result requires the repeated application of a single observation function 
to a system having its own internal time variations. In essence, the delay coordinate map relies on the 
system dynamics to provide measurement diversity when the observations are restricted to a single fixed 
function h. 

One of the principle benefits of a stable delay coordinate map would be resilience to noise and other 
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imperfections. The effect of noise on the reconstruction of state space attractors has also been previously 
considered by several researchers apart from the notion of a stable embedding. In [22] the authors looked 
at a modified embedding theorem for systems corrupted by dynamical noise, considering specifically 
embeddings using multivariate time series system outputs and taking more measurements than is typically 
required for a delay-coordinate map. In [23], the authors study the effects of observational noise via 
statistical methods, showing how the choice of delay-coordinates (i.e., the choice of observation function 
h and sampling time T s with respect to the system dynamics) affects the ability to make predictions. In 
particular, they showed that poor reconstruction amplifies noise and increases estimation error. 

In related work, there has also been considerable research on the choice of the optimal sampling interval 
T s for the construction of the delay coordinate map (typically for the study of chaotic dynamical systems). 
In particular, one of the more successful techniques is choosing T s to minimize the mutual information 
between any two time series samples separated by T s [24]. The resulting reconstructed attractor usually 
makes the quantitative and qualitative study of the chaotic dynamics easier as the reconstructed trajectories 
tends to be unfolded to maximally fill the reconstruction space. In contrast, our goal is to characterize 
conditions on the system and observation functions (including but not limited to T s ) such that the geometry 
of the attractor is faithfully represented in the reconstruction space. 

III. Stable Embeddings for Linear Dynamical Systems 

In this section we present our main technical results. We first present a preliminary result in Sec- 
tion III-A that gives explicit sufficient conditions on the system and observation functions to guarantee 
that the delay coordinate map is a one-to-one map of the state space attractor. This is akin to Takens' 
Embedding Theorem, and we present it here to highlight the specific differences that arise under our 
restrictions (linear systems and measurement functions) and when seeking explicit conditions on system 
and measurement pairs (as opposed to the conditions for generic observation functions in Takens' 
theorem). We then present our main technical contribution in Section III-B, giving explicit conditions 
on the system and observation function for the delay coordinate map to be a stable embedding of the 
attractor with specific guarantees on the conditioning number of the embedding. 

A. Takens' Embeddings 

The following theorem gives conditions on the system and the observation function such that the delay 
coordinate map F is a one-to-one mapping. This is analogous to Theorem II. 1 in the context of linear 
dynamical systems and linear observation functions. 
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Theorem III.l (Linear Takens' Embedding [1]). Assume a linear dynamical system of class A{d) in R N 
that is in steady state. Choose T s > to be the sampling interval, h G M N to be the observation Junction, 
and denote by F the delay-coordinate map with M delays as defined in (4). Suppose that M > 2d, the 
A<&- eigenvalues {e ±J iTs } are distinct and strictly complex, 1 and vfh ^ Ofor all i = 1, ■ ■ ■ , d. Then for 
all distinct pairs of points x,y € M., F satisfies (5) for some constants C and 5 < 1. 

Proof: The proof of this theorem can be found in Appendix A. 

To explore the differences that arise in our specific setting of linear systems and linear observation 
functions, we compare the conditions of this theorem with that of Takens' theorem. First, we notice that 
the conditions on the measurement operation are very similar. Theorem III. 1 requires M > 2d, which is 
similar to Takens' M > 2d and likely only different because of the specific structure of our attractors. 
There is also a close correspondence with the other condition on the measurement function vfh / 0. 
This requirement is an explicit condition on the relationship between the system and observation function 
ensuring that the observation function can capture some information from every dimension of the attractor. 
We note that (Lebesgue) almost-every h € M N will satisfy this condition, and so we find that this is just 
a more explicit version of Takens' result that "almost-every" h ensures an embedding. 

Next, we compare our conditions on the system with those imposed by Takens' theorem. Theorem III.l 
requires that the A$> -eigenvalues are distinct and strictly complex, which is equivalent to having e J pTs / 
e ±j0qT s (distinct) an( j e jv P T s (strictly complex) for all p / q and p, q = 1, • • • , d. While this 

requirement implies 8 that M. does not have periodic orbits of period kT s for k = 1, • • • ,2d (thus 
satisfying Takens' condition), our condition is actually more stringent than this restriction on periodic 
orbits (likely due to our restricted class of linear observation functions). We note that since {9i}f =l are 
distinct by definition, this condition is dependent on the choice of sampling interval T s . One can verify 
that choosing T s < m ^ e .y is sufficient (but not necessary) to meet the condition of the theorem. 

7 We say that a number x is strictly complex if Im {x} / 0. 

8 This implication can be shown by contradiction. Pick any 1 < k < 2d and suppose that M has at least a periodic orbit of 
\& with period kT s . This would be equivalent to saying that e i<>pkTs = (e jepTs ) — 1 for all p, meaning that for each p from 1 
to d the quantity e ±j6pTa is uniquely one of the k roots of unity. However this is impossible as there are 2d distinct and strictly 
complex values of {e ±j0pTs } and there are only k <2d roots of unity (including ±1 which are not allowed), and hence we 
have a contradiction. 
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B. Stable Tokens' Embeddings 

Before presenting our main result giving conditions for a stable embedding of a dynamical system in a 
delay coordinate map, it will be useful to define and understand the following quantities that characterize 
how well-behaved the system and measurement process are both individually and jointly. First, we define 
ki = minj G { lj d y {^7} an d ^2 = max ie{i,...,d} j \h\ \ l } characterizing the minimum and maximum 
projection of the (normalized) observation function on the ^-eigenvectors. Roughly speaking, these 
quantities are an indication of the disparity between the dimensions of the system attractor that are 
best and worst matched to the observation function. One would expect that a measurement system is 
most efficient when it observes all parts of the attractor equally such that k\ k, K2- Second, we define 
Ai,A2 as the smallest and largest eigenvalues of V H V, respectively. As we discussed at the end of 
Section II-A, these quantities describe how well the system attractor fills the dimensions of the state 
space that it occupies (i.e., when A2 3> A\ the attractor is very elongated in the state space). Again, 
we would expect that a system will be most amenable to observation when it fills the space such that 
A x « A 2 . 

Finally, we define v := max < |sin(#pT s )| 

p¥=q { 

also bound the constants associated with the stable embedding. Notice that the first term is large if 9 p T s 
is small for some p (or that 9 p T s « kir for some integer k), meaning that the system state proceeds 
in the span of Re{v p } and Im{w p } at a slow pace, thus not producing much diversity in consecutive 
measurements of the system along these dimensions. The second term is large if 9 p T s — 9 q T s is small 
(or near kir) for some p / q and p, q = 1, • • • ,d, implying that the system state is proceeding in the 
subspaces spanned by Ke{v p } ,lm{v p } and Ke{v q } ,lm{v q } at almost the same rate. This condition 
would be unfavorable because the system will take an extremely long time to display enough diversity to 
determine that it is actually traveling on two separate subspaces instead of one. The third term is similar 
to the second term if we write 9 p T s + 9 q T s = 9 P T S — (—9 q T s ). Thus if 9 p T s ~ —9 q T s , then the system 
is again proceeding on two subspaces at almost the same rate (although the system is proceeding in one 
of the subspaces in the "opposite" direction). 

Armed with these definitions, we now present our main result giving deterministic, explicit and non- 
asymptotic guarantees on the conditioning of the delay coordinate map. 

Theorem III.2 (Stable Linear Takens' Embedding). Assume a linear dynamical system of class A{d) in 
R N that is in steady state. Choose T s > to be the sampling interval, h G to be the observation 



sin 



which will 
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function such that \\hW2 = jj, and denote by F the delay-coordinate map with M delays as defined in 

(4) . Suppose that M > ((2d — 1) i^j, the A^-eigenvalues {e ±: > eiTs } are distinct and strictly 
complex, and vfh 7^ for all i = 1, ■ ■ ■ ,d. Then for all distinct pairs of points x,y € A4, F satisfies 

(5) with constants C := d + and 5 := So + 5\(M), where: 

Proof: The proof of this theorem can be found in Appendix A. 

We first note that the sufficient conditions of this theorem are the same as those in Theorem III. 1 , 
except that the required number of measurements is larger to ensure specific guarantees on the conditioning 
number 5 (i.e. 5 < 1). Also, note that this theorem requires an observation function with a particular norm 
\\hW2 = |f. This normalization is to remove from C any dependence on the number of measurements M 
and the dimension of the attractor 2d (since k\ and rc| both scale inversely with d). The normalization 
plays no other significant role in the proof (and therefore could be eliminated without losing generality, 
but at the expense of clarity). 

To understand the implications of Theorem III.2, we examine the behavior of the conditioning number 
5 as it is the main quantity of interest. In the theorem statement, 5 is a sum of Sq (which does not depend 
on M) and 5\(M) which is positive for all M and for which liniM-^oo = 0. Thus, we see that by 

taking more observations one could drive the conditioning guarantee for the mapping to 5 = 5q, but not 
below. In other words, some system and measurement pairs will have a plateau preventing the conditioning 
guarantee for the delay coordinate map from improving beyond a fundamental limit. This is in contrast 
with CS results where the conditioning can be continually improved by taking more measurements. 
Indeed, in order to get arbitrarily good conditioning we would need 5q = 0, which happens if and only 
if A 2 kI - A\k\ = <=> ^ = ^| = 1. Recall that A\ = A 2 implies that the attractor M maximally fills 
the subspace spanned by V and k\ = k 2 means that the observation function h projects equally onto the 
^-eigenvectors. Thus even with an infinite number of measurements, the delay coordinate map can only 
be guaranteed to be an exact isometry (5 = 0) when the system and observation function maximally fill 
and measure the subspace containing the attractor. 

The quantity S\ (M) can be used to determine the number of measurements necessary to ensure that the 
conditioning number 5 is within e of the optimal value 5q. To find the required number of measurements 
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to meet this target M(e), we set S\(M) = e and solve (6) for M to get 

(2d- l)v ( 2A 2 k\ 

^A 2 k\ + Aik{ 



M(e) = ^____Z_ ( . £p ) , (7) 



By multiplying the numerator and denominator by and noting that < < 1, we can deduce 
that ^ 2d ~ 1 ^ u < M(e) < 2 ( 2d ~ 1 )' / _ One immediate application of this fact is that we can calculate the 
number of measurements necessary to guarantee a stable embedding for the delay coordinate map with 
a specified conditioning 5 € (So, 1), which is made precise in the following corollary. 

Corollary III.l. Suppose we have a linear system of class A(d), observation function h and sampling 
time T s such that the conditions of Theorem 111.2 are satisfied. Choose any < e < (1 — So). If the delay 
coordinate map F defined in (4) has a number of delays M chosen to satisfy M > 2 ( 2d ^ 1 ) v t then F is 
a stable embedding of M. with conditioning 5 < Sq + e. 

The proof of this corollary is not shown, but follows immediately from Theorem III.2. While the linear 
scaling with d seen in this result is in line with state-of-the-art CS results, we see that in contrast to 
typical CS results M(e) does not depend on the ambient dimension N. Also note that M(e) depends 
strongly on the ^-eigenvalues via the quantity v. In contrast, the interactions of the ^-eigenvectors and 
the observation function h determine the lower bound on the conditioning 5, as evidenced by the roles 
played by the quantities A\,Ai and /ci,/C2 in the formula for Sq. 

IV. Simulation experiments 

While the main result in Theorem III.2 is encouraging, it remains to be shown that (i) the theoretical 
quantities actually reflect the salient embedding characteristics seen in system and measurement combi- 
nations, and (ii) having a stable embedding actually improves our ability to infer information about a 
hidden attractor. For example, it is important to know if the fundamental limits on the embedding quality 
5(M) are artifacts of our proof technique or are empirically observed. If these limits on the embedding 
quality are actually present, it is also important to know if the related bounds are tight, both in their 
asymptotic values and in terms of their convergence speed as M increases. Finally, for a stable embedding 
to be a valuable goal, we need to demonstrate that achieving this goal results in improved performance in 
specific tasks performed in the reconstruction space. This section will use a series of simple simulations 
to explore these aspects of our theoretical results. 

As a general approach, each simulation in Sections IV-A and IV-B below involve creating an observation 
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function h and a test system of dimension N = 50 in class A(d) (denned by ^-eigenvalues and A- 
eigenvectors) so that the conditions of Theorem III.2 are satisfied. We choose the arbitrary initial point 
xo defining the attractor such that ao = [1, • • • , 1] T and xq = Vao, and we assume a sample time of 
T s = 1. For a single trial, we generate a random pair of points on the attractor x and y by choosing 
uniform random numbers t x ,t y from (0, 10000) and assigning x = Ve Atx cto and y = Ve Aty ao. In other 
words, we start the system from the (arbitrary) initial condition and stop it after a random amount of 
time to get a single point on the attractor. We then vary M from 1 to 200, and run 1000 trials for each 
M (renormalizing h for each M as per Theorem III.2). For each trial we calculate the quality of the 
conditioning Q(x,y) = ^ F ^~^f^ 2 , and for each M record the largest and smallest value of Q(x,y) 
(denoted maxjQ} and minjQ}, respectively) as a way to quantify how the conditioning changes with 
the number of measurements. In the subsequent plots the dotted lines represent C(l ± 8q), showing the 
theoretical asymptotic bounds on the conditioning quality Q(x, y), and the dashed lines are the theoretical 
bounds on the conditioning C(l ± 5{M)) given by Theorem III.2. 

A. Bounds on the embedding quality 

One of the fundamental characteristics of Theorem III.2 is that in general, the bound on the embedding 
quality 5(M) approaches Sq / as M increases rather than approaching zero as is typical in CS results. 
The first question to ask is whether pairs of systems and observation functions can actually display such a 
plateau as predicted, or whether the conditioning instead continually improves with more measurements. 
To demonstrate this effect, we generate a simulation as described above with d = 3, choosing the 
^-eigenvalues {9i}f =1 uniformly at random from (0, n), and taking care to ensure that the resulting A®- 
eigenvalues are distinct and strictly complex to satisfy the conditions of Theorem III.2. We then create the 
^-eigenvectors by letting Vi = ^(e2j-i + je2i), where {e^} are the canonical basis vectors in M. N . This 
choice of ^-eigenvectors ensures that A\ = A<i- To generate a generic observation function h, we first 
create a vector c € R N such that c = Yli=i((^ + w 2i-i) Re {v j} + (1 + W2i) Im {««}), where the {wi} are 
i.i.d. Gaussian random variables of zero mean and variance 0.1. Thus c is a (random) linear combination 
of the vectors that form the subspace of the attractor. For each M we let h = h(M) = \J~j$j^ so 
that \\h\\l = jj to meet the conditions of Theorem III.2. Note that the small variance of {wi} produces 
{luf^/ip/INII} centered tightly around 1, making So small (due to A\ = A2 and m, K2 both close to 
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max{Q} 
min(Q) 
C(1 ±5(M)) 



C(1 + 5 ; 

•c 

.c(i-6„: 



100 150 
M 



(a) (b) (c) 

Fig. 3. Simulations exploring the asymptotic bounds on the conditioning of the delay coordinate map. Plotted are the 
largest and smallest value of Q(x, y) (depicted by max{Q} and mm{Q} respectively) attained by the 1000 pairs of 
x, y for each M . The dotted (red) lines represent the values ofC(l ± Sq) and C, and the dashed (black) lines are the 
theoretical values of C(l ± S(M)). (a) In this simulation, A\ = A2 but k,\ ^ K2, thus a plateau on the conditioning 
is seen, (b) In this simulation, A\ = A2 and k\ = k,2- As expected, the conditioning number asymptotically reaches 
as M grows, (c) In this simulation, A\ 7^ A2 and n\ ^ K2 and the predicted asymptotic values of the conditioning are 
not tight. 

I). 9 The specific parameters in this simulation are shown in Table I. 
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TABLE I 

Parameters for the simulation shown in Figure 3(a). In this case the relevant quantities are A\ = A2 = 1, 
Ki = 0.8346, k 2 = 1.1637, v = 5.6954 and S = 0.1647. 

The results for this simulation are shown in Figure 3(a). We see from the behavior of max{Q} and 
min{Q} that the embedding does indeed reach a fundamental limit where the conditioning does not 
improve with more measurements. Furthermore, we see in this case that this plateau is correctly captured 
by the value C(l ± 5q) as described in Theorem HI.2. Additionally, the bounds C(l ± <5(M)) do contain 
max{Q} and min{Q} as expected from the theorem, and the characteristic shape of these curves seems 
to qualitatively reflect the empirically observed convergence of the conditioning number. 

As confirmation, we also verify the implication of Theorem III.2 that system and measurement combi- 
nations can be constructed where the conditioning can be made arbitrarily good with more measurements 
(akin to the more typical CS results). To show this, we create another system with the same ^-eigenvalues 
and ^-eigenvectors as in the previous simulation, with the latter implying that A\ = A2. For the 



observation function, we first define c = V[l, • • • , 1] T , and for each M we let h = h(M) = \ljjj§ir 



as 



9 The random variables {wt} are used to ensure that «i, K2 are close to, but not exactly equal to 1. The case where ki = Ki = 1 
is considered in the simulation in Figure 3(b). 
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before. One can verify this choice results in |t^/i|/||/i||2 = 1 for all i, and thus n\ = ki- The parameters 
of this experiment are summarized in Table II. 
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TABLE II 

Parameters for the simulation shown in Figure 3(b). The experiment was chosen such that A\ = A 2 = 1 and 
«i = K 2 = 1, so that S = 0. As the A-eigenvalues are the same as in the previous experiment, v remains at 5.6954. 

With this choice of parameters such that A\ = A2 and k\ = K2, Theorem III.2 indicates that 5o = so 
that liniAf^oo S(M) = 0. Figure 3(b) shows the results of running the simulation in the same manner as 
before. The values of max{Q} and minjQ} clearly converge to C as expected, showing that in this case 
the conditioning of the embedding can indeed be made arbitrarily good by taking more measurements. 

Although Theorem III.2 indicates that a finite limit on the conditioning number is always reached 
when either A\ / A2 or k\ / K2, this bound is not always tight and the predicted plateau level of 
C(l ± So) may be conservative. To show this, we construct a similar simulation as above, now setting 
the ^-eigenvectors to be V{ = = = =(aj + jh), where {ai,bi} are randomly constructed vectors 
in R N whose entries are i.i.d. zero-mean Gaussian random variables with a variance of 1. We keep the 
^-eigenvalues the same and generate h in the same manner as the first simulation shown in Figure 3(a). 
The specific parameters for this simulation are shown in Table III, where we see that indeed A\ / A2 
and K\ / K2- Figure 3(c) shows the results of running the simulation in the same manner as before. 
We see that although a limit on the conditioning number is reached as predicted by Theorem III.2, the 
predicted plateau level of C(l ± <5o) is not tight and the conditioning can be better than that predicted 
by 6 . 
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0.4315 



TABLE III 

Parameters for the simulation shown in Figure 3(c). We see that A\ = 0.4315, A 2 = 1.5316, Ki = 1.1318 and 
K2 = 1.8138. Since the A-eigenvalues are the same as in the first simulation shown in Figure 3(a), v remains the 

same at 5.6954. We also calculate 5q = 0.7010. 
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(b) 

Fig. 4. Examining the effect of the A-eigenvalues on the convergence speed of the conditioning, (a) In this simulation, 
d = 1 and we test = ^ , and |g. As expected, the closer 9 is to tt/2, the faster the rate of convergence ofS(M) 
to 6q. (b)In this simulation, d = 3 and we vary between 3 sets of A-eigenvalues with different values of v. As expected, 
the set of eigenvalues that gives the smallest v provides the fastest rate of convergence ofS(M) to Sq and vice versa. 

B. Convergence Speed 

In the simulations of the previous section we concentrated on the conditioning limits predicted by 
Theorem III.2, ignoring issues of the speed of convergence to those limits. Examining the formula for 
5i(M) in Theorem III.2, we see that the ^-eigenvalues (via the parameter v) affect the convergence 
speed of 5{M) to its asymptotic value of Sq. In particular, the convergence speed scales with 1/z/, which 
is also demonstrated in (7) where the number of measurements M(e) necessary to get the conditioning 
5 within e of the best possible value (5q) is proportional to v. 

For ease of analysis, we first consider the case where d = 1, meaning that v = |sin(0)| _1 (since 
T s = 1), where ±jd are the sole ^-eigenvalues. In this case, | sin(#)| -1 > 1 with the minimum attained 
when 9 = ^ + kn for any integer k. The closer 9 is to ^ + kir, the faster the convergence of 5{M) to 5o- 
This is illustrated by the following simulation where the ^-eigenvectors are chosen such that A\ = A2, 
and the observation function is chosen randomly as in the experiment shown in Figure 3(a) (except with 
d = 1). Figure 4(a) plots maxjQ} and min{Q} for 9 = ^rj, ^ and Jj, showing that Theorem III. 2 
correctly captures that the convergence speed to the asymptotic value of C(l ± <5o) varies inversely with 
the value of 9. 

When d > 1, the joint relationship of the ^-eigenvalues (not just their individual values) determines v, 
and subsequently the convergence speed. One can see intuitively in the definition of u that ^-eigenvalues 
which are maximally spread out should produce favorable convergence speeds. To illustrate this, we 
generate a simulated system with d = 3, choosing the ^-eigenvectors such that A% = A2, and generating 
an observation function h randomly (as in the experiment in Figure 3(a)). We also choose three sets of 
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(a) (b) 

Fig. 5. Examining the predicted number of measurements necessary to reach a specified conditioning level, (a) Plotted 
is the upper-half of Figure 3(a), also indicating C(l + 5q + e) with e = 0.2. (b) In this simulation, we explore how 
M(e) (for a fixed e = 0.1) varies with the A-eigenvalues for the system defined in Figure 4(a). We plot the theoretical 
values of M(e) (given in {!)) for varying from to ir/2 together with its actual values (as described in the text) 
obtained by running experiments for each 8. 

^-eigenvalues: two uniformly random sets, and one set that are slight perturbations of equally spaced 
points around the unit circle according to 9 P = J^- (the choices of 8 P and their respective v are given 
in Table IV). 10 Figure 4(b) shows the results of the simulation, with the maxjQ} and minjQ} curves 
showing clearly that v indeed controls the speed of convergence of 6(M) as predicted. 
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Set 1 (nearly equal spacing) 


0.7836 


1.5864 


2.3566 


2.6619 


Set 2 (random) 


0.0491 


1.5737 


2.3490 


20.3851 


Set 3 (random) 


0.0212 


1.5684 


2.3549 


47.1388 



TABLE IV 

Choice of {9i} (in radians) for the experiment in Figure 4(b) and their respective v value. 

Given that Theorem III.2 seems to be correctly capturing the convergence speed dependence on v, 
the last facet of the problem to explore is the tightness of this bound. Specifically, given a system of 
class A(d) and an observation function h, it is often of interest to estimate the minimum number of 
measurements ^M(e)^ needed to ensure that for any M' > M the conditioning number 5{M') is at 
most e above the asymptotic level of 5q (such an estimate is given in (7)). To examine this, we refer 
back to the simulation shown in Figure 3(a) with parameters given in Table I. Fixing e = 0.2, Figure 5(a) 
re -plots max{Q} together with the line C(l + 5o + e). Using the given parameters and (7) we calculate 
that M(e) » 166. Note that this value is also the intersection of the curve C(l + 5{M)) with the 



The slight perturbation is used for plotting convenience so all three curves converge to the same asymptotic value. If exactly 
equally spaced eigenvalues are used, the attractor is sampled uniformly and the convergent value will be inside C(l ± <5rj), 
making comparative plots difficult. 
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line C(l + Sq + e). Figure 5(a) shows that max{Q} actually met this tolerance with only around 30 
measurements. Thus, although the theoretical value of M(e) given by (7) is correct, it is pessimistic in 
at least this particular case. 

To demonstrate that the linear dependence of M(e) on v is correctly captured in the theorem, we 
restrict ourselves to d = 1. Recall that when d = 1, v = |sin(#)| -1 (since T s = 1) where ±j0 are 
the sole ^-eigenvalues. We repeat the simulation shown in Figure 4(a), this time using 100 values of 
9 equally spaced between (0, tt/2). Fixing e = 0.1, for each value of 9 we note the value of M where 
for all M' > M, max { ma ^ {Q} - 1, 1 - 2H£^2i j < $ + e. We call this value the "actual" M(e), in 
contrast to the "theoretical" M(e) given by (7). Figure 5(b) shows these actual and theoretical values of 
M(e) as a function of 9. This comparison shows that while the theoretical M(e) captures the same trend 
as the actual M(e), the theoretical estimate can be pessimistic compared to the empirical values (though 
it is not clear if the theoretical bounds are achieved by some systems). 

C. Stable Embeddings for Dimension Estimation 

To demonstrate the value of stable Takens' embeddings, this section will explore a simulated task esti- 
mating the dimensionality of an attractor. The correlation dimension is a measure of attractor dimension 
often applied to strange attractors of chaotic systems [25], which corresponds to the actual geometric 
dimension of regular objects such as the circles and ellipses seen in linear system attractors [3]. To be 
precise, we first define the correlation sum of tolerance e for a set of points {xu} lying on a subset M 
and temporally related via the flow (i.e., x^ = <3? fc xo) as 

2 K K 

C(e,K) ■■= K(K _ 1) Y1 E Q(e-\\F(x p )-F(x q )\\ 2 ), (8) 

p=l q=p+l 

where F is the delay coordinate map and &(■) is the Heaviside step function defined as 0(x) = if 
x < and Q(x) = 1 if x > 0. The correlation dimension is defined as D = lim e ^o limftr^oo 
This makes intuitive sense as in the limit of small e and large K, we expect C(e,K) to scale like 
C(e,K) oc e~ D , where D is the dimension of the subset M in question. Theoretically, one way to 
estimate correlation dimension is to plot the graph of logC(e, K) against loge for a large value of K, 
then simply read off the gradient for small values of log e. In the absence of noise and with a topology 
preserving Takens' embedding (i.e. M > 2d), this estimate should be as good as if one had access to the 
hidden system state. However, when noise is present, small values of log e will be capturing the noise 
characteristics and overestimating the attractor dimension. A common approach in this case is to plot the 
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Fig. 6. Estimating the correlation dimension of a circular attractor M. of a linear system of class A(l). (a) The 
conditioning of the stable embedding decreases with increasing number of measurements M . (b) The graphs of D(e) 
for the various M considered are plotted against log e. The correlation dimension estimate can be read off the plateaus 
in these graphs. These plateau regions become more distinct with increasing M (improving conditioning), and appear 
to converge to a value near the true dimension of 1. 

local gradient D{e) = aio | 1 ^ e J X ^ against log(e) for a large value of K and read off an estimate of the 
correlation dimension D from a plateau in the graph, preferably in the regime of small e. 

In this section, we use the above approach to estimate the correlation dimension of linear system 
attractors Ai in the reconstruction space M. M . For this simulation construct a linear dynamical system of 
class .4.(1) with N = 100, ^.-eigenvalue # = 3^5 and ^.-eigenvector v = [1, j] T (resulting in A\ = A2 
and a circular attractor). We also choose h = [1, 1] T , implying that k\ = K2 and subsequently 5q = 0. 
Figure 6(a) shows that the actual conditioning 11 of F approaches zero as we increase M. To simulate 
noisy measurements, we corrupt the resulting time series formed by h by adding white gaussian noise 
with zero mean and standard deviation a = 0.05 (to give an SNR of about 32dB). 

Figure 6(b) shows the plots of D(e) against log(e) with a number of delays M = 3, 73, 153, 223. For 
the graph corresponding to M = 223, a plateau is easily seen between — 1 < log e < 0, and corresponding 
to a correct dimension estimate of approximately 1. We observe that by taking more measurements (i.e., 
improving the conditioning of the embedding), the estimate of the correlation dimension also improves. 
Moreover, the width of the plateau region where we read off the correlation dimension estimate increases 
with increasing M, thus making its estimate more precise. Note that when we take the minimum number 
of measurements M = 3 required by Takens' Theorem, there is no discernible plateau region in Figure 
6(b) for us to estimate the correlation dimension, and even the most reasonable estimate near log e = 1 
is less accurate than with the estimates produced by the embeddings with better conditioning. 

"By actual conditioning, we mean the empirical value 8 = max'! ma ^|'3} — 1, 1 — ml "Wl I for Q defined in Section IV. 
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V. Conclusion 

The main result of this paper has established that a delay coordinate map (using linear observation 
functions) can form a stable embedding for all pairs of points on the attractor of a linear dynamical system 
of class A(d). The explicit, deterministic and non-asymptotic sufficient conditions we give for this stable 
embedding yield several observations about the embedding itself and favorable properties of system and 
measurement pairs. For example, for many system and measurement pairs, the conditioning number 
S(M) reaches a non-zero asymptotic value of 5q with increasing M. This "plateau effect" is in contrast 
with typical CS results where the conditioning of the stable embedding can be continually improved by 
increasing the number of measurements. Furthermore, the convergence speed of the embedding quality 
to this limit is governed by the joint relationship of the system eigenvalues, which capture the relative 
speed with which the system explores the different dimensions of the state space (i.e., more diversity 
in these speeds results in faster convergence). Finally, we also see that the minimum number of delays 
M of the delay coordinate map scales linearly with the attractor dimension but is independent of the 
system dimension. This is again in contrast with typical CS results, where the number of compressive 
measurements also scales logarithmically with the system dimension (but interestingly does parallel recent 
improvements in these bounds for the stable embedding of manifolds [15]). 

While the comparisons with standard CS results reveal these interesting and non-intuitive technical 
differences between the results in each case, these discrepancies actually point to a much deeper difference 
in the problem setups that must be appreciated when embedding attractors of dynamical systems. Perhaps 
the easiest way to see this is to consider that in the present case of delay coordinate maps, while 
the number of measurements doesn't scale with the ambient system dimension, the total number of 
measurements may in fact have to be larger than the system dimension (M > N) in order to make a 
particular conditioning guarantee. In the typical CS case, this would of course be a ridiculous proposition. 
If the RIP property required (M > N) random measurements (e.g., due to very large constants in the 
typical sufficient conditions), one would likely abandon the CS strategy and simply take N uncoded 
measurements (e.g., in the canonical basis). However, in the case of delay coordinate maps for dynamical 
systems, this luxury is simply not available. For example, the observers often do not have any control 
over the choice of observation function h, and in these cases cannot simply change the way the system 
is measured. But, more importantly, even if we were given complete control over h, it is only a "seed" 
that is used in producing the whole measurement process. One can view the entire set of measurements 
as being generated by repeatedly forcing this observation function through the dynamics of the system 
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(seen explicitly in writing the delay coordinate map in (4)). Said another way, because there is only a 
single observation function for the system, the total measurement process for a delay coordinate map is 
beholden to the dynamics of the system itself to provide sufficient diversity to make the measurements 
informative. Therefore, even with complete control over the observation function, delay coordinate maps 
represent a highly restricted total measurement process that cannot be completely controlled (without 
access to and control over the system that is hidden and in need of measurement). 

Characterizing the delay coordinate map embeddings for attractors of linear dynamical systems with 
linear observation functions is a subset of the more general problem of characterizing these embeddings 
for attractors of nonlinear systems and general observation functions. From the results here, we conclude 
that there is reason to be optimistic that similar stability results can be obtained for this more general 
case of interest. Furthermore, these results also lead us to conclude that there are several issues that differ 
from standard CS results and will need to be carefully considered in any generalization. 

Appendix A 

Proof of Stable Takens' Embedding Theorem 

Because Theorems III. 1 and III.2 are very similar in structure, we will essentially lay out the proof 
approach for both of them together in this section and then separately establish the necessary details for 
each result. Before proceeding with the specific proofs, we will introduce some notation and preliminary 
results that will be useful. 

A. Notation and preliminaries 

1) Frame theory: Drawing on some terminology from the field of frame theory, we say that a sequence 
of vectors {gi}f£ 1 in C K , M > K, forms a frame [26] for C K if there exists two real constants 
< B 1 < B 2 < oo such that for all a G C K , Bi||a||| < Y^Zi \(gi, a)\ 2 = \\Ga\\l < B 2 \\a\\l, where 
G H = (gi | g 2 | ••• | gu) G C KxM , the concatenation of the {gi}f£i, is called the frame analysis 
operator and B\ , B 2 are called the frame bounds. The frame bounds can be defined as B\ = A m j n and 
B 2 = A max , where A m i n and A max are the minimum and maximum eigenvalues of G H G G C KxK . 

2) Linear delay coordinate maps: Because the attractor M is contained in the span of the columns of 
V, for any x, y G M we can write x = Va x and y = Va y for some complex coefficients a x ,a y G C 2d . 
Using F to denote the delay coordinate map for a linear system with flow matrix $ and observation 
function h as described in (4), the k-th row (for k = 1, ■ ■ ■ , M) of the vector F(x) — F(y) can be written 
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h T ($*-i( x _ y )) = h T (&-± V {a x - ay)) = h T (VD k ~\a x - a y )) = (g k , a x - a y ), where 
g*= h T VD k - 1 = \{vlh)e^ k -^ T %{v?h)e^ k -^ T %. . . , (vj h)e^ k - 1 ^ T ' , {v^h)e^ e ^ 



(9) 



and D is the diagonal matrix comprised of .4$ -eigenvalues as defined in Section II-A. Thus, we have: 
\\F{x) - F(y)\\l = Y,k=i \(9k, (a x - a y ))\ 2 = \\G(a x - a y )\\ 2 2 , where G G C Mx2d is the concatenation 
of {g^} as described above. In this following, G is fixed to be this matrix given here. 

3) Eigenvalue bounds: It will be important in the following proofs to determine bounds on the 
extreme eigenvalues of the matrix G H G. To that end, we first introduce the well-known Gershgorin 
Circle Theorem, which we state here for notational convenience: 

Theorem A.l (Gershgorin Circle Theorem [18]). The eigenvalues of a K x K matrix A all lie in the 
union of the Gershgorin disks of A. The Gershgorin disk T>i for i = 1, • • • , K, is defined as T>; L = 
{xeC : \x — Ci\ < ri\ , where ri := Ylf-i j^i \ is the radius, and C\ := (^4)j,j is the center of 

the i-th disk. Thus X(A) C where X(A) = {X±, • • • , A^}, and {A«} are the eigenvalues of A. 

To apply the Gershgorin Circle Theorem to obtain the extrema eigenvalues of G H G, we introduce the 
following useful lemma that gives values for centers C L and radii r; L of the Gershgorin disks X>j of G H G. 

Lemma A.l. For i = 1, ■ ■ ■ , d, the centers of the Gershgorin disks of G H G are C 2 i-\ = C 2 i = \v^h\ 2 M 



while their radii are r 2i -\ = r 2i = \vfh\ 2 ^g^y + Y, P =i, P ^i\ v f h \\ v p h\ 



sin(M(6>,-e p )T 3 /2) 
sin((6» i -e p )r s /2) 



+ 



sin(M(6>,+6> p )T a /2) 
sm((e z +e p )T s /2) 



Proof: We can write G H G = Ylk=i 9k9k ' •> wnere we recall that is defined as in (9). Thus the (p, q) 
entry of G H G can be expressed as: (G H G) P:q = YlkLi 9k(p)9k(q)*, where Qk{p) denotes the p-th entry 
of the vector g^. As such, the formation of G H G involves the calculation of sum of complex trigonometric 
polynomials due to the complex exponentials ({e ±j, ( fc ~ 1 ) 6 ' pTs }) appearing in the terms of each g^. A few 
separate cases need to be considered because of the differences in the even (2p) and odd (2p— 1) numbered 
rows of G H G for all p. We first consider the even numbered rows. The diagonal terms actually have 
a fairly simple form: (G H G) 2p , 2p = Ek=i9k(2p)9k(2p)* = T,k=iK h \ 2 = M\v^h\ 2 . The adjacent 
term to the left is given by: (G H G) 2p , 2p ^ = E^ 1 {(v T p h) e -^f = (v^h) 2 J^ 1 {e~^ T -) k = 
(Vph) 2 ^s^g 9 ^^ e -j(M-i)e p T s ^ w h ere ^ i as t expression follows from the standard formula for a finite 
geometric sum, pulling out common exponential factors, and using Euler's formula. The other cross terms 
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for all p, q G {1 



. , d} such that p / q can be derived similarly as: 



(G G) 2p ,2q-1 



(G G) 2 p,2q 




(^M e - J -(M-l)(^)T 
Vr.) 



£M P -3(M-1)(^)T 8 



The relevant quantities for the odd numbered rows are given similarly as 



l,2p-l 
-l,2p 



(G H G) 2p , 2p = M|^| 2 , 

{G H G)* 2 p2p-1 = (v^hf 




(G G) 2 p-l,2q 



(G G) 2p _i j2g _i 



Finally we note that many of the above complex quantities only differ in their phase because of symmetry 
in the summations, making their magnitudes equal when calculating the radii of the Gershgorin disks. The 
expressions for C L and fj in the lemma are obtained simply by applying the notation of the Gershgorin 



B. General proof approach 

Using the preliminaries above, we can now sketch out the general approach for the proof of both 
theorems below. Essentially, the theorems result from using (or establishing) the following three facts: 

1) If G H G G C 2dx2d is established to be full rank, then {g k }k=i form a frame in cM - Thus there 
exists < B 1 < B 2 < oo such that B 1 < [£teh*Mlla < B 2 holds for all distinct pairs of points 



x, y e M.. In particular, to establish conditioning guarantees, we can let B\ and B 2 be the smallest 
and largest eigenvalues of G H G (respectively) and determine bounds on those important quantities. 

2) Next, we use the fact that \\x - y\\j = (a x - a y ) H V H V(a x - a y ) to get A l < ^2^% ^ A ^ 
where A\ and A 2 are the smallest and largest eigenvalues of V H V € C 2dx2d respectively. By the 
definition of V we know that V H V is well-defined and full rank, meaning that < A\ < A 2 < oo. 

3) Putting the 2 previous steps together, we get < ^ < ^p^rp^ 2 < ^ < oo, where the bounds 
^ and ^ can be manipulated to get the scaling constant C and conditioning 5 in (5). Specifically, 



Circle Theorem to the calculated magnitudes of the entries of G G. 
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C. Proof of Theorem III. 1 

Proof: For Theorem III. 1, we follow the three steps detailed in Appendix A-B, where we only need 
to show that G H G is indeed full rank given the conditions of the theorem. Consider first the case when 
M = 2d, where showing G H G is full rank is equivalent to showing det(G^G) = det(G) 2 > 0. The 
matrix G can be expressed in terms of a product of a Vandermonde matrix and a diagonal matrix: 



G 



l l ■■■ l l 

,-j«\T„ e j0 1 T s ... e -j6 d T s £ jS d T s 



e -j2de 1 T s e j2dB 1 T 3 ... e ~j2dB d T 3 e j2dB d T 3 



/vfh (0) \ 

v?h 



v d h 

\ (0) vfhj 



= M T H, 



where M is the Vandermonde matrix with the A$ -eigenvalues as its parameters and H is a diag- 
onal matrix whose diagonal elements are made up of the projection of h onto the ^-eigenvectors. 
Thus, det(G) = det(M) det(H). One of the conditions of Theorem III. 1 ensures that the {e ±jWr *}f =1 
are distinct, which implies that the determinant of this square Vandermonde matrix [27, Ch 0] obeys 
| det(M)| > 0. Also since vfh ^ for alH = 1, ■ ■ ■ , d, we also know that | det(i?)| > 0. Therefore for 
M = 2d, rank(G H G) = 2d. Since adding vectors to a frame does not change the rank of G H G (i.e., 
frame bounds cannot be lowered by adding more vectors to the frame), it follows that if M > 2d then 
rank(G^G) = 2d and the proof of Theorem III. 1 is complete. ■ 

D. Proof of Theorem 111.2 

Proof: To prove Theorem III.2, we again follow the three steps detailed in Appendix A-B, this 
time establishing specific guarantees on the frame bounds B\(M) and F>2{M) appearing in the first 
step. From Lemma A.l, we first observe that for all i we can bound the Gershgorin disk radii by 

r 2i _! = r 2i < (\v?hf + EU,p^ \ v ? h \K h \ + Ej=i >p/i K^lK^l) * < (2d-l)4\\h\&. Noting 
that \\h\\2 = jj, we see that for each i, the Gershgorin disks of G H G satisfy X>2i-i = T^2i C 
[\vf h\ 2 M - \\h\\l(2d- \vf h\ 2 M + \\h\\l(2d- 1)vk%\ . Then applying the Gershgorin Circle The- 

orem, we get \(G H G) C \jfVj C [2ck? - j£(2d - \)vk\, 2&k\ + |f (2d - l)vnl] . By choosing 
Bi(M) = 2d (kI - i2d ~$ VK * } and B 2 {M) = 2d (k% + (2d ~^ )l/w ' ), and applying step 2 in Section A-B, 
we arrive at: 

Bi(M) < \\F(x) - F(y)g < B 2 (M) 
A 2 ~ \\x - y\% Ai 

for all distinct pairs of points x, y G M and for all M. 
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Now as M ->■ oo, Bi(M) ->• 2ckf and S 2 (M) -)• 2ek|. Thus in the limit of large M, the lower 

2dK 2 2dK 2 

and upper bounds of the inequality (10) approaches and -%f, respectively. We define the scaling 
constant C as the average of the asymptotic values of these lower and upper bounds: C := ^ + -^-Y 
Also define the conditioning number <5(M) for a given M as the maximum deviation of the lower and 
upper bounds of (10) from C, normalized by C: 5{M) := max jl — B qa 2 , — l} • Now 1 — 

Bi(M) _ 1 _ 2rf(Kf-(2rf-l)z//c|/M) _ A 2 K 2 2 -A 1 K 2 1 +2A 1 {2d~l)vK 2 2 /M , B 2 (M) _ 1 _ 2d(«l+(2rf-l)^/M) _ 
CA 2 - 1 (2rf/2)(/s?+^(^Mi)) ~~ A 2 k|+Ai/c? ' ana ~CA~T 1 ~ (2d/2)((A 1 /A 2 )K 2 +ni) 

1 = A ^t^ { A^ )UKllM - S^e Ai < A 2 , we have that 5(M) = - 1 = + 

a 2 2 ^+a 1 k 1 <y2d ~hP U ■ ^ e can tnen define <5o and <5i(M) as the first and second term of the sum above. 
Notice that S(M) represents a worst case bound on the deviation from C, as we maximized over upper 
and lower bounds that may not be the same magnitude (i.e., in general C(l — S(M)) / Bl ^ )- 

Finally, we recall that for the embedding conditioning number to be valid, we must have < 5(M) < 1. 
The first condition S(M) > is achieved by construction. The upper bound is equivalent to the condition 
for M required by the theorem statement, thus completing the proof. ■ 
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